A stochastic model incorporates randomness; given the same inputs, a stochastic model may generate different outputs on different runs.
This is useful when the process you want to model is fundamentally stochastic, or if there is heterogeneity in its dynamics.
Unfortunately stochastic models can get complicated, and come with a lot of notation and formalism. Our goal here is to quickly review basic probability theory so that we can use stochastic models productively.
A random variable is a variable whose value is random. In this course, random variables will have a real-world meaning, like:
Probability \(\Pr(\cdot)\) is a function whose argument is the value a random variable can take on, which returns a positive number less than or equal to 1. The set of all such values describes the probability distribution.
For discrete random variables, we will write the probability mass function \(\Pr(X=x)\). For continuous random variables, we will use the density, defined below.
You can interpret probability statements in terms of frequency, e.g. How often is \(X=x\) in repeated trials?
The cumulative distribution funciton (CDF) is \(F(x) = \Pr(X < x)\).
The density of a continuous random varible is \(f(x)=F'(x)\), the derivative of CDF \(F(x)\) of \(X\), where it exists.
The density is like the probability mass function, but for continuous random variables.
\(\Pr(Y=y|X=x)\) is the probability that \(Y=y\), given that \(X=x\). We interpret the statement on the right-hand side of the conditioning sign \(|\) as deterministic. Bayes’ theorem says how to compute conditional probabilities:
\[ \Pr(Y=y|X=x) = \frac{\Pr(Y=y,X=x)}{\Pr(X=x)} \]
Example: Let \(Y\) be HIV status and let \(X\) be age. \(\Pr(Y=1|X=x)\) is the HIV prevalence among people aged \(x\).
The expected value of a random variable is its “average” value.
\[ E[X] = \sum_x x \Pr(X=x) \]
or
\[ E[X] = \int_{-\infty}^\infty x f(x) dx \]
where \(f(x)\) is the density of \(X\). The expectation is the average of values \(x\) that \(X\) can take on, weighted by their probability.
Can a random variable \(X\) have expectation equal to a value that \(X\) can never take? Yes. If \(X\sim \text{Bernoulli}(1/2)\), then \(\Pr(X=x)=1/2\) for \(x=1\) and \(x=0\). But \(E[X]=1/2\).
Conditional expectations are just expectations with respect to a conditional probability distribution, \[ E[Y|X=x] = \sum_y y\Pr(Y=y|X=x) \] \[ E[Y|X=x] = \int_{-\infty}^\infty y f(y|X=x) dy \]
You can calculate “overall” or “marginal” probabilities by computing a weighted sum over conditional probabilities
\[ \Pr(X=x) = \sum \Pr(X=x|Y=y) \Pr(Y=y) \] \[ \Pr(X=x) = \int \Pr(X=x|Y=y) f(y) dy \]
The Law of total expectation is similar.
\[ E[X] = \sum E[X|Y=y] \Pr(Y=y) \] \[ E[X] = \int_{-\infty}^\infty E[X=x|Y=y] f(y) dy \]
Two random variables are independent if their joint probability factorizes into the product of their marginal probabilities, \[ \Pr(X=x,Y=y) = \Pr(X=x) \Pr(Y=y) \] or \(f(x,y) = f(x)f(y)\).
This is the classic coin-flipping distribution, taking value 0 or 1.
The random variable \(X\) has Bernoulli\((p)\) distribution if \(\Pr(X=1)=p\) and \(\Pr(X=0)=1-p\).
A Binomial\((n,p)\) random variable is a sum of \(n\) independent Bernoulli\((p)\) variables. The probability mass function is \[ \Pr(X=k) = \binom{n}{k} p^k (1-p)^{n-k} \] This is useful for modeling the number of events that occur in \(n\) trials, with fixed probability of success.
A common probability model for count data is \[ \Pr(X = k) = \frac{(\lambda t)^k e^{-\lambda t}}{k!} \] This distribution looks weird, but it has a nice story, which we will see when we talk about Poisson processes.
The uniform distribution is usually defined on \([0,1]\). \(X \sim \text{Unif}(0,1)\) means the density is \[ f(x) = 1 \] for \(0\le x \le 1\). The uniform distribution represents a non-informative (flat) distribution over its support.
If \(X\) has beta distribution, the density is \[ f(x) \frac{x^{\alpha-1}(1-x)^{\beta-1}}{B(\alpha,\beta)} \] for \(0\le x \le 1\), where \(B(\alpha,\beta) = \Gamma(\alpha)\Gamma(\beta)/\Gamma(\alpha+\beta)\). This distribution is useful for modeling probabilities or proportions.
If \(X\) has exponential distribution, the density is \[ f(x) = \lambda e^{-\lambda x} \] This distribution is useful for waiting times to an event. \(E[X] = 1/\lambda\).
The normal\((\mu,\sigma^2)\) density is \[ f(x) = \frac{1}{\sqrt{2\pi} \sigma} \exp\left[ -\frac{1}{2\sigma^2} (x-\mu)^2 \right] \] This is the classic “Gaussian” or “bell-shaped” distribution, often used to model random noise or error about a mean value.
There are lots of other “canonical” probability distributions
Discrete: geometric, negative binomial, beta-binomial
Continuous: Gamma, Log-Normal, Laplace, failure time distributions
We won’t use these in this class, but it’s important to know that there are lots of options for modeling random quantities. This is partly what the field of statistics is about.
In this course, we will focus on Markov stochastic models in discrete and continuous time. A Markov model is one whose next event depends only on its current state, and not on its prior history (Bailey (1990), Renshaw (2015))
Let \(X(t)\) be a Markov process taking \(n\) different states, in discrete time, \(t=1,2,\ldots\). Define the matrix of one-step transition probabilities from each state \(i\) to each state \(j\),
\[ \mathbf{P} = \begin{pmatrix} p_{11} & p_{12} & \cdots & p_{1n} \\ \vdots & \vdots & \ddots & \vdots \\ p_{n1} & p_{n2} & \cdots & p_{nn} \end{pmatrix} \]
The rows of \(\mathbf{P}\) sum to one. The transition probability of moving from \(i\) to \(j\) in time \(t\) is
\[ P_{ij}(t) = \Pr(X(t) = j | X(0) = i) \]
Compute transition probabilities by multiplying \(\mathbf{P}\) by itself \(t\) times:
\[ P_{ij}(t) = [\mathbf{P}^t]_{ij} \]
We described deterministic systems in terms of their rates earlier. But a continuous-time stochastic model produces different output each time we run it. How can we describe its rates of change when its path is different every time?
We need to describe the rate of change of its transition probabilities. Consider a continuous-time Markov chain \(X(t)\) on a discrete state space.
The transition rate between states \(i\) and \(j\) is \(\lambda_{ij}\) and define the rate matrix
\[ \mathbf{Q} = \begin{pmatrix}
-\lambda_{11} & \lambda_{12} & \cdots & \lambda_{1n} \\
\vdots & \vdots & \ddots & \vdots \\
\lambda_{n1} & \lambda_{n2} & \cdots & -\lambda_{nn}
\end{pmatrix}
\]
where \(\lambda_{ii} = \sum_{j\neq i} \lambda_{ij}\). The transition probabilities \(P_{ij}(t) = \Pr(X(t)=j|X(0)=i)\) can be computed by \(\mathbf{P}(t) = \exp[\mathbf{Q} t]\).
Consider a continuous-time Markov model at time \(t\), where \(X(t)=i\). The rate of jumping to state \(j\) is \(\lambda_{ij}\). What does this mean?
It means that the jump to \(j\) occurs after a waiting time with distribution Exponential\((\lambda_{ij})\). If there are many possibilities \(k\) with \(\lambda_{ik}>0\), then the waiting time to the next jump to any state is the minimum of all these exponential waiting times. Fortunately,
\[ \min\{W_{ij}:\ j\neq i\} \sim \text{Exponential}\left(\sum_{j\neq i} \lambda_{ik}\right) \]
Furthermore, the destination of the next jump is independent of this waiting time.
Several properties of CTMCs are useful:
Drawbacks:
Now consider a continuous-time Markov chain whose only positive transition rates are to the next integer state, \(\lambda_{i,i+1}=\lambda\). This is called a counting process. Let \(X(t)=i\), so the waiting time to the jump to \(i+1\) is
\[ W_i \sim \text{Exponential}(\lambda) \]
following the last event. Given \(X(0)=0\), what is the distribution of the number of jumps \(X(t)\)?
\[ X(t) \sim \text{Poisson}(\lambda t) \]
So \(E[X(t)] = \lambda t\). This is where the Poisson distribution comes from.
A queue is model for a list of individuals waiting to be served. Examples:
The simplest queue is a continuous-time Markov model with \(\lambda_{i,i+1}=\lambda\) and \(\lambda_{i,i-1}=\mu\).
A slightly more complicated queue has \(\lambda_{i,i+1}=\lambda\) and \(\lambda_{i,i-1}=i\mu\).
Queues are useful because they can be combined into a stochastic model for sequential movement through a series of compartments or states.
Gonsalves et al. (2017)
The “Gillespie algorithm” performs forward simulation of a stochastic compartmental model (Pineda-Krch (2008)).
You don’t really need to know how it works. Forward simulation of continuous-time Markov models is computationally straightforward.
The ssa function in the GillespieSSA package implements a fast version of the Gillespie algorithm.
library(GillespieSSA)
parms <- c(beta = 0.005, gamma = 0.1)
x0 <- c(S = 499, I = 1, R = 0)
a <- c("beta*S*I", "gamma*I")
nu <- matrix(c(-1, 0, +1, -1, 0, +1), nrow = 3,
byrow = TRUE)
out <- ssa(x0, a, nu, parms, tf = 100, simName = "SIR model")plot(out$data[, 1], out$data[, 2], col = "green",
ylim = c(0, 500), type = "l", xlab = "Time",
ylab = "Number", bty = "n")
lines(out$data[, 1], out$data[, 3], col = "red")
lines(out$data[, 1], out$data[, 4], col = "blue")
legend(50, 500, c("S", "I", "R"), pch = 16,
col = c("green", "red", "blue"))ssassa(x0, a, nu, parms, tf)Here, x0 is the starting state, a is the transition rate, nu is the change in the compartment, parms is the parameters, and tf is the final time.
The function returns a data frame with jump times and states.
head(out$data)## S I R
## timeSeries 0.00000000 499 1 0
## 0.07328629 498 2 0
## 0.14305226 497 3 0
## 0.60422483 496 4 0
## 0.94886495 495 5 0
## 0.98264998 494 6 0
“Statistical models” are stochastic because they contain random variables. Consider the usual Normal/Gaussian linear model \[ y = \alpha + \beta x + \epsilon \] where \(\epsilon \sim N(0,\sigma^2)\). This model is stochastic because \(\epsilon\) is random.
In my opinion, you should think about so-called statistical models as mechanisitic.
Bailey, Norman T. 1990. The Elements of Stochastic Processes with Applications to the Natural Sciences. Vol. 25. John Wiley & Sons.
Gonsalves, Gregg S, A David Paltiel, Paul D Cleary, Michael J Gill, Mari M Kitahata, Peter F Rebeiro, Michael J Silverberg, et al. 2017. “A Flow-Based Model of the HIV Care Continuum in the United States.” Journal of Acquired Immune Deficiency Syndromes 75 (5): 548–53.
Pineda-Krch, Mario. 2008. “GillespieSSA: Implementing the Stochastic Simulation Algorithm in R.” Journal of Statistical Software 25 (12): 1–18.
Renshaw, Eric. 2015. Stochastic Population Processes: Analysis, Approximations, Simulations. Oxford University Press.